Fast byte sorting algorithm

Introduction

The basic idea of what this algorithm has to do for us is that when the algorithm gets a byte sequence input it has to output the sequence of bytes sorted from smallest to biggest byte Let me give you an example of what am I talking about here, a knowledge of hexadecimal system of numbers would be nice to have for easier understanding:

Let's say your input is 7841345 (ascii), which equals 37383431333435 (hex), the algorithm will sort them from smallest to biggest byte (or you can reference it as number, for easier understanding, if you do not know hex), that means that our output will be 1344578 (ascii) 31333434353738 (hex). Another case is if input is from alphabet, like bdcapx, the output will be in right alphabet order, in given case it will be abcdpx. That means not only numbers are considered/sorted in/by this algorithm but all characters you input.

Now let's get on to how the algorithm works. I originally coded it in asm but it can easily be translated to some high level language, you just have to read the text after the asm code which explains what's going on in the code.

The code example

Here is the code and then I'll explain what is done in it.

; ASM CODE                                ; CODE COMMENTS
mov	esi, offset Input_string	; init(ialize) what we entered
mov	edi, offset Output_string	; init where to put result
mov	ecx, input_string_length	; init length of entered string
mov	ebx, esi			; move input string to ebx also (? - explanation at the end)
xor	edx, edx			; clear edx register (we use this register as counter)

next_byte:				; only a label
inc	edx				; increase counter (next byte to test)
mov	esi, ebx			; move input string to esi (? - explanation at the end)

_search:				; only a label
lodsb					; load byte from input string
cmp	dl, al				; compare taken byte from input string with value in
; counter
jz	write_byte			; does the value in counter match byte taken from input
					; string
cmp	al,0				; are all bytes taken from input string
jz	next_byte			; if all bytes are taken from input string reload input 
; string to esi
jmp	short _search			; try next byte from input string

write_byte:				; only a label
stosb					; write the matched byte into output buffer
dec	ecx				; decrease the input string lenght
jz	the_end			; if all bytes from string are written end search
jmp	short _search			; get back to our search algorithm

the_end:				; only a label
; display Output_string code

; Explanation for - ? we need to load the input string buffer to an independent
register (which is not affected by other parts of algorithm) because of the
instructions we are using, specifically this instruction, lodsb, which loads a
byte from buffer loaded into the esi register and automatically increases it
so, as you can see from the given code, we have to load input buffer into esi
every time we finish one searching cycle (next_byte label)

Code explanation

At the beginning of the code you can see the initialization of the input string (string we wrote, the string from which we want bytes to be sorted), output string (place where the sort result will be stored) buffers and input string length (how long is the input string). We load one by one byte into one register and then compare it with another register which we use as a counter (we start from 1 to +infinite, until we have all of our bytes in right order). We repeat that comparison with the same number in counter for every byte in our input string. If there is no match, then we increase our counter and do the comparison for every byte again, when we have a match we write the byte into our output buffer and decrease the length of string that we've initially loaded into one of registers so we know when all strings are written (that is when the content of that register equals/reaches zero). When we write the matching byte we do not increase our counter immediately but only when all bytes in input string are compared to counter value (there might be the same byte used further in the input string and we do not want to miss that), if we find the same byte again at some other place in the string we write it down next to the previous one (we decrease the length of remaining bytes in the string also). When everything is done we just need to display the content of our output string buffer and we are done.

Conclusion

This was written just to give you another idea on how can you make fast sorting algorithm. I hope you got the idea behind this and if you did, coding your own algorithm (in any coding language) shouldn't be hard for you, even if you are beginner at coding. Just in case I included a working example, so you can check how algorithm really works in practice.

I wrote this article when I had lack of will for studying and I just felt like seeing my name in Hugi ;-) Although I don't have Internet access anymore at my apartment (nice student life), just at my university (SFSB, Croatia), I will try to write a bit more than one article, by then regards and respects.

Mario Suvajac, SFSB, Croatia
05.11.2003.